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MAP KINASE PHOSPHATASE GENE AND USES THEREOF 



This invention relates to two genes encoding novel proteins that 
possesses threonine- tyrosine phosphatase characteristics, to the 
proteins themselves and methods for their recombinant production. 
5 These genes are located in the cytoplasm which is a novel feature 
of this class of phosphatase genes . 

Protein .tyrosine phosphatases (PTPs) are a growing family of 
enzymes which play an important role, together with protein 
tyrosine kinases, in many cellular processes such as cell 
10 division, proliferation and differentiation 1 " 3 . The PTP family 
can be sub-divided according to structural features which 
determine whether they are transmembrane or cytoplasmically 
located. All PTPs contain. a catalytic domain consisting of a 
highly conserved active site with the consensus sequence 
15 [I/V]HCXAGXXR[S/T3G (where X represents any amino acid). The 
regions flanking the catalytic domain of the PTPs are diverse and 
consist of sequences which appear to target the PTPs to specific 
cellular locations. 

Amongst the superfamily of tyrosine phosphatases are a sub-family 
20 of dual specificity phosphatases, so-called because they can 
dephosphorylate substrates which are phosphorylated on both 
serine and threonine as well as tyrosine residues. Several of 
these enzymes can dephosphorylate and deactivate MAP kinase 
(Mitogen activated protein kinases) . Genes for some of these MAP 
25 kinase phosphatases are known. 

The mechanism by which extracellular signals for growth and 
differentiation are transmitted to the nucleus to alter gene 
expression is the subject of much current investigation. In many 
cases, the transduction of these signals requires the activities 

30 of key enzymes known generally as MAP kinases. MAP kinase 
pathways have been implicated in a large number of signal 
transduction pathways. For instance, the activation of MAP 
kinases has been observed during growth factor stimulation of DNA 
synthesis and during differentiation, secretion and stimulation 

35 of glycogen synthesis 4 . MAP kinase has been shown to 
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. phosphorylate and activate effector substrates such as the 
transcription factors c-jun and elk-1. Known MAP kinases, and 
the pathways in which they are involved, have been reviewed 15 . 

Map kinase, is activated by phosphorylation of threonine and 
5 tyrosine by a dual specificity kinase/ "MAP kinase kinase" . This 
kinase is in turn activated by phosphorylation by "MAP kinase 
kinase kinase", one form of which is the proto- oncogene c-raf. 
The activation of c-raf is not fully understood at present but 
apparently there is a requirement for an interaction with GTP- 
10 bound p21 ras protein 6 . 

The full picture of how MAP kinase pathways are switched off is 
not yet clear. Down -regulation of MAP kinase activity by 
dephosphorylation is likely to be of key importance. The human 
gene CL100 7 and its murine homologue 3CH134 8 were originally 

15 discovered as genes whose transcription was stimulated by growth 
factors, oxidative stress and heat shock. Subsequently, they 
were shown to encode polypeptides that have both serine /threonine 
and tyrosine phosphatase activity 5 * 10 on MAP kinase. This removal 
of phosphate from both threonine and tyrosine on MAP kinase is 

20 unusual. When expressed in vitro 6 the CL100 10 gene product has 
been shown to be very specific for MAP kinase and leads to its 
inactivation. Co-expression of the murine gene 3CH134 and the 
erk2 MAP kinase isoform in mammalian cells leads to the 
dephosphorylation and inactivation of the MAP kinase 11 . 

25 Furthermore, it has been shown recently that this phosphatase 
gene can also block cellular DNA synthesis induced by an 
activated version of the ras oncogene in rat embryo 
fibroblasts". 

For present purposes, the terms "Mitogen- activated protein- 
30 kinase", "MAP kinase" and "MAPK" all apply to protein kinases 
that are activated by dual phosphorylation on threonine and 
tyrosine residues. This may be in response to a wide array of 
stimuli. Different MAP kinases are activated in response to 
different extracelluar stimuli, including (depending on the MAPK) 
35 stress, osmotic stress, mating pheromone (in yeast), growth 
factors, TNF, IL-1 and LPS . Map kinases include SMK1, HOG1, 
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MPK1, FUS3/KSS1, spkl, ERK1/ERK2 , JNK/SAPK and p38. "MAP kinase 
phosphatase" activity or function is the ability to 
dephosphorylate one, or sometimes both, of the threonine and 
tyrosine residues on a MAP kinase, which residues are 
phosphorylated on activation of the MAP kinase. Thus, MAP kinase 
phosphatases are capable of hydrolysing either, or both, 
phosphothreonine and phosphotyrosine residues on a MAP kinase. 

Martell et al, (October 1995) . J. Neurochem. 65: 1823, describe 
the cloning of a protein tyrosine kinase abundant in brain. 
Theodosiou et al (1996) Hum. Molec. Genet. 5: 675 report the 
cloning of the murine M3/6 cDNA which is also described herein. 

The present invention has arisen from the characterisation of a 
region in cosmids corresponding to yeast artificial chromosomes 22 
during which a series of cDNA clones were identified. One of 
these, designated M3/6, was isolated from a mouse adult brain 
cDNA library. The cDNA of the invention shows homology to a 
family of phosphatases and appears to define a new subfamily of 
phosphatases. Significantly, this cDNA contains a translated 
complex repeat at its 3' end which may be polymorphic. 

We have thus surprisingly found two genes that encode new 
proteins that appear to be a new members of the dual -specificity 
phosphatase family. We have called the new murine protein M3/6. 
A cDNA sequence of murine M3/6 is presented as Figure 1. Another 
cDNA sequence of the gene is presented as Figure 2, together with 
a translation of the open reading frame. All amino acid 
sequences represented herein are represented in the conventional 
N- to C- terminal direction, in the standard one letter code, 
unless this is specified to the contrary. 

The partial cDNA sequence of the human homologue, Hb5, is shown 
in Figure 3. The cDNA sequence has been cloned. It shows about 
81% identity at the nucleotide level to the murine sequence. 
Figure 4 shows an alignment between the open reading frame of the 
murine (top sequence) and human genes. The murine gene contains 
a trinucleotide repeat region which is not present in the human 
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hVH-5 gene. Excluding this region, the two protein sequences are 
about 90% homologous (where this term means amino acid identity) 

The invention provides a protein of murine, human or other 
mammalian origin based on the cDNA information provided herein. 
As shown in Example 2, mRNA can be detected in the eye, brain, 
lung and other tissues of mice and human fetal liver, kidney, 
lung and brain tissue using the cDNA of the present invention, 
the cloning of which is described in Example 1. Translation of 
the cDNA obtained from Example 1 in a coupled reticulocyte assay 
indicated that it encodes a polypeptide of approximate molecular 
weight of 80 kD. 

Thus the invention provides a murine phosphatase designated M3/6 
or a human or other mammalian homologue thereof which phosphatase 
is characterized by the following features: 

(a) it is encoded by a cDNA sequence obtainable from a 
mammalian brain cDNA library, said DNA sequence being 
selectively detectable with a murine DNA sequence as 
shown in Figure 1 or one or more of the human DNA 
sequences shown in Figure 3 ; and 

(b) it comprises a phosphatase catalytic domain of the 
sequence VHCXAGXXRSX, where X is any amino acid. 

Preferably the catalytic domain sequence is VHCLAG I SRS A . 

The protein desirably has the additional feature of either: 

(c') tyrosine phosphatase activity, or: 

(c H ) threonine phosphatase activity. 
Preferably it has both activities. 

The protein preferably has one or more of the additional 
features : 

(d) it has a sequence of 299 amino acids at its N-terminus 
which are substantially as the M3/6 sequence shown in 
Figure 5; 

/ (e) it has a cytoplasmic location in at least some cell 
■' types ; 

(f) it is encoded by an mRNA of approximately 5kb; and 
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(g) it comprises a C- terminal region rich in serine. 

In a preferred aspect, the phosphatase is M3/6 of murine origin, 
in which case at least one of features (c'l and (c"), together 
with all of features (d) to {g> may be present. However the 
phosphatase may also be of human origin. In this case at least 
one of features (C) and (c°) plus feature (d) may be present. 

The term "selectively detectable" means that the cDNA. used as a 
probe is used under conditions where a target cDNA of the 
invention is found to hybridize to the probe at a level 
significantly above background. The background hybridization may 
occur because of other cDNAs present in the brain cDNA library. 
In this event background implies a level of signal generated by 
interaction between the probe and a non-specific cDNA member of 
the library which is less than 10 fold, preferably less than 100 
fold as intense as the specific interaction observed with the 
target cDNA. The intensity of interaction may be measured, for 
example, by radiolabelling the probe, e.g. with 32 P. 

Suitable conditions may be found by reference to the Examples. 
The cDNA of Figure 1 can detect both murine and human DNA at 
O.lxSSC, 0.1% SDS at 55©C (where lxSSC is 0.15M sodium citrate, 
0.15 M sodium chloride at pH7.5). 

Tyrosine and threonine phosphatase activity assays are generally 
well knonw in the art and any suitable assay may be used. 
Reference may be made for example to Fischer et al, 1991, Science 
253 : 401-406 or Zeng and Guan, 1993, J. Biol. Chem. , 268 :16116- 
16119. 

Preferably all the proteins of the invention will have dual 
specificity phosphatase activity, namely they are capable of 
dephosphorylation at both Ser/Thr and Tyr residues. However, it 
should be borne in mind that phosphatase activity for the M3/6 
protein has not yet been demonstrated, although homology with 
other sequences (see Example 1) strongly suggest this. It is 
therefore possible that this and the other polypeptides of the 
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invention may in fact have alternative and/or additional 
activities, functions or roles. 

The sequence of 299 amino acids at the N-terminus of proteins of 
the invention will be at least 80%, preferably at least 90%, more 
5 preferably at least 95% and most preferably at least 97.5% 
homologous to the M3/6 sequence of Figure 5. 

The cytoplasmic location of the protein of the invention may be 
determined in accordance with methods described in the 
accompanying examples . Cells which may be examined to determine 
10 cellular localization are preferably neuronal cells such as PC12 
cells in at least some cell types. 

The cDNA is. it is encoded by an mRNA of approximately 5kb. It 
will be appreciated that determination of mRNA size is often (as 
is the present case) established by northern blotting techniques 

15 and thus is limited in accuracy by the limitations of this 
technique. In addition there will be some heterogeneity of the 
size of the polyA tail of mRNA molecules. An approximate size 
of 5kb will be understood by those of skill in the art to have 
a tolerance of at least +0.5 kb. Nonetheless, approximate mRNA 

20 size is still a useful characteristic for determining the 
identity of a protein. 

We have found that the C- terminal region of the murine protein 
of encoded by the cDNA sequences of the invention are rich in 
serine, and also glycine. The present invention can thus be 

25 broadly thought of as relating to a phosphatase, such as a dual 
specificity phosphatase, that possesses at least one region of 
amino acid repeats at least in the murine variant of such a 
protein. Such repeats will be a contiguous and continuous repeat 
of the same amino acid. The repeat, or each repeat, may be of 

30 glycine (G) and/or serine (S) residues. Each repeat may be at 
least four or five, such as ten, amino acids in length. The 
repeat may be no longer than 20, such as 30 or 40, amino acids. 
/' . 

./ 
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The presence of at least one stretch of 19 contiguous serine 
residues in the C-terminal region (i.e. within 150 or 200 amino 
acids) is indicative of at least the murine form of the protein 
of the present invention. This C-terminal region may also 
5 comprise a stretch of at least 17 contiguous glycine residues. 

In addition to M3/6 and mammalian homologues thereof the present 
invention also contemplates: 

(a)" an allelic variant of such proteins; 
{b) a protein at least 80% homologous to such proteins; 
10 (c) a fragment of any one of such proteins (or (a) or (b) 

having phosphatase activity and being of at least 15 
amino acids long; or 
(d) a fusion protein comprising such proteins (or any one 
of (a) to (c) . 

15 All proteins and polypeptides within this definition are referred 
to below as proteins or polypeptide (s) according to the 
invention. 

A protein or polypeptide of the invention will preferably be in 
substantially isolated form, i.e in a form in which it is free 
20 of other polypeptides with which it may be associated in its 
natural environment (eg the body) . It will be understood that 
the polypeptide may be mixed with carriers or diluents which will 
not interfere with the intended purpose of the polypeptide and 
yet still be regarded as substantially isolated. 

25 The polypeptide of the invention may also be in a substantially 
purified form, in which case it will generally comprise the 
polypeptide in a preparation in which more than 90%, eg. 95%, 98% 
or 99% of the polypeptide in the preparation is a polypeptide of 
the invention. 

30 Mutant proteins or polypeptides are also contemplated in 
accordance with the invention. These will possess one or more 
mutations each of which is one or more additions, deletions, or 
substitutions of amino acid residues. Preferably the mutations 
will not affect, or substantially affect, the structure and/or 
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function and/or properties of the polypeptide. Thus, mutants may 
suitably possess phosphatase activity, preferably dual 
specificity phosphatase activity. Mutants can either be 
naturally occurring (that is to say, purified or isolated from 
5 a natural source) or synthetic (for example, by performing site- 
directed mutagenesis on the encoding DNA) . It will thus be 
apparent that polypeptides of the invention can be either 
naturally occurring or, preferably, recombinant {that is to say 
prepared using genetic engineering techniques) . 

10 An allelic variant will be a variant which will occur naturally 
in a human or murine animal and which will dephosphorylate in a 
substantially similar manner to the proteins of the invention. 

Similarly, a species homologue of the M3/6 protein will be the 
equivalent protein which occurs naturally in another species, eg. 
15 other than mouse or. human, and which performs the equivalent or 
similar function in that species. Within any one species, a' 
homologue may exist as several allelic variants, and these will 
all be considered homologues of the protein. Allelic variants 
and species homologues can be obtained by following the 
20 procedures described herein for the production of a protein of 
Example 3 and performing such procedures on a suitable cell 
source, eg from human or a rodent, carrying an allelic variant 
or another species. Since the protein may be evolutionarily 
conserved it will also be possible to use a polynucleotide of the 
25 invention to probe libraries made from human, rodent or other 
cells in order to obtain clones encoding the allelic or species 
variants. The clones can be manipulated by conventional 
techniques to identify a polypeptide of the invention which can 
then be produced by recombinant or synthetic techniques known per 
30 se. Preferred species homologues include mammalian or amphibian 
species homologues. 

A protein at least 80% homologous to the M3/6 protein is included 
in the invention, as are proteins at least 90% and more 
preferably at least 95% homologous to this protein. This will 
35 generally be over a region of at least 20, preferably at least 
30, for instance at least 40, 60 or 100 or more contiguous amino 



WO 97/06245 



PCT/GB96AU906 



-9 - 

acids . Methods of measuring protein homology are well known in 
the art and it will be understood by those of skill in the art 
that in the present context. Homology is usually calculated on 
the basis of amino acid identity (sometimes referred to as "hard 
5 homology") . 

Generally, polypeptide fragments of a M3/6 protein or its allelic 
variants or species homologues thereof capable of exhibiting 
phosphatase activity will be at least 10, preferably af least 15, 
for example at least 20, 25, 30, 40, 50 or 60 or 100 amino acids 
10 in length. 

It will be possible to . determine whether the proteins or 
polypeptides of the invention exhibit phosphatase activity using 
standard routine techniques, a. suitable test being given later 
in this specification in Example 6. Alternatively one may 

15 examine the sequence of the protein to see if it possesses the 
characteristic phosphatase catalytic domain, namely: 
(I/V)HCXAGXXR(S/T)G, wherein X represents any amino acid. In the 
M3/6 polypeptide of the invention this potential catalytic domain 
is found at 244-254 of the protein encoded by Figure 2, except 

20 in both cases the C- terminal G is replaced by A. 



Preferred fragments of proteins of the invention include those 
which exhibit phosphatase activity and/or possess the above 
catalytic domain sequences. The Examples presented herein 
describe a number of methods to analyze the function of the 
25 protein and these may be adapted to assess whether or not a 
polypeptide possesses certain activities. 

In Figure 5 the conceptual translation of half of (the 5' end of) 
the M3/6 coding sequence is aligned and compared with the human 
CL100 MAP kinase phosphatase protein sequence. A high degree of 
30 homology between the sequences can be seen, further indicating 
phosphatase activity for the M3/6 protein. 

A polypeptide of the invention may be labelled with a revealing 
or detectable label. The (revealing) label may be any suitable 
label which allows the polypeptide to be detected. Suitable 
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labels include radioisotopes, e.g. 125 I, enzymes, antibodies and 
linkers such as biotin. Labelled polypeptides of the invention 
may be used in diagnostic procedures such as immunoassays in 
order to determine the amount of phosphatases in a sample. 

5 A polypeptide or labelled polypeptide according to the invention 
may also be fixed to a solid phase, for example the wall of an 
immunoassay dish. 

In a second aspect of the invention, there is provided a 
polynucleotide which comprises: 
10 (a) a sequence encoding a protein or polypeptide of the 

invention as defined above or the complement of said 
sequence; 

(b) a sequence of nucleotides shown in Figure 1 or Figure 

2; 

15 (c) a sequence capable of selectively hybridising to a 

sequence in either (a) or (b) ; or 
(d) a fragment of any of the sequences in (a) to (c) . 

The polynucleotide of the present invention is suitably in 
substantially isolated or purified form. 

20 Polynucleotides of the invention include the DNA sequence of 
Figure 1 and fragments thereof capable of selectively hybridizing 
to the sequence of Figure 1. Polynucleotides of the invention 
also include polynucleotides comprising human cDNA characterized 
by the presence of one or more of the sequences shown in Figure 

25 3. 



The present invention in one embodiment provides a nucleic acid 
sequence comprising the nucleotide sequence according to Figure 
1. This (murine) protein has an ATG initiation codon as shown 
in Figures 1 and 2. Amino acid residues encoded by the protein 
30 of Figure 2 - which are also encoded by the sequence of Figure 
1 - show high homology to the cdc25 PTP of yeast 28 at residues 
2 9 -,49 and 117-136. The sequence shows high homology to several 
PT&3 in the public database EMBLGENBANK . This M3/6 gene is 
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murine; parts of the gene encoding for the human homologue (Hb5) 
are shown in Figure 3. 

The gene that encodes the protein we have called M3/6 (which 
appears to be a tyrosine phosphatase) , contains a complex triplet 
5 distal to the catalytic domain which is translated into protein. 
This domain comprises a run of four serine residues which is 
followed by a run comprising 17 glycine residues which in turn 
is followed by a further run comprising 23 serine residues which 
is interrupted near the N-terminal section by a single 
10 asparagine. 

It is thought that this repeat might cause instability of this 
domain if it expands. Any expansion of this triplet repeat may 
disrupt the normal activity of the protein in the cell and lead 
to a disease phenotype in a way similar to other neurological 
15 disorders. This protein is highly expressed in the brain with 
much lower levels in liver and spleen tissues. 

It will be appreciated that in polynucleotides of the invention, 
which encompass nucleic acid encoding a polypeptides of the 
invention, triplet repeats of the codons encoding the repeated 
20 amino acid residues may be present. Such codons may encode for 
either glycine and/or serine residues. Such triplet repeats may 
be at least 15, such as at least 30, bases in length, generally 
up to a maximum of 60 nucleotides. 

If glycine residues are repeated, then triplet repeats of (GGC) n 
25 or (GGT) n (which are 2 of the 4 codons encoding Gly) can be 
present. Here the number n is an integer, suitably from 4, 5 or 
10 up to 20. For serine, repeats of (AGT) m or (AGC) m (this 
residue has a degeneracy of 6) may exist. The variable m is also 
an integer, such as from 4 to 20. 

30 The polynucleotide of the invention may also comprise RNA. It 
may also be a polynucleotide which includes within it synthetic 
or ^modified nucleotides. A number of different types of 
modification to oligonucleotides are known in the art. These 
include methylphosphonate and phosphorothionate backbones, 
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addition of acridine or polylysine chains at the 3' and/or 5' 
ends of the molecule. For the purposes of the present invention, 
it is to be understood that the oligonucleotides described herein 
may be modified by any method available, in the art. Such 
5 modifications may be carried out in order to enhance the in vivo 
activity or lifespan of oligonucleotides of the invention used 
in methods of therapy. 

A polynucleotide capable of selectively hybridizing to the DNA 
of Figure 1, 2 or 3 will be generally at least 70%, preferably 
10 at least 80 or 90% and optimally at least 95% homologous to the 
DNA of Figure 1, 2 or 3 over a region of at least 20, preferably 
at least 30, for instance at least 40, 60 or 100 or more 
contiguous nucleotides. These polynucleotides are also within 
the invention. 

15 A polynucleotide of the invention will be , in substantially 
isolated form if it is in a form in which it is free of, other 
polynucleotides with which it may be associated in its natural 
environment (usually the body) . It will be understood that the 
polynucleotide may be mixed with carriers or diluents which will 

20 not interfere with the intended purpose of the polynucleotide and 
it may still be regarded as substantially isolated. 

A polynucleotide according to the invention may be used to 
produce a primer, e.g. a PCR primer, or a probe e.g. labelled 
with a revealing or detectable label by conventional means using 

25 radioactive or non- radioactive labels, or the polynucleotide may 
be cloned into a vector. Such primers, probes and other 
fragments of the DNA of Figure 1, 2 or 3 will be at least 15, 
preferably at least 20, for example at least 25, 30 or 40 
nucleotides in length, and are also encompassed within the 

30 invention. 

Polynucleotides, such as a DNA polynucleotides according to the 
invention may be produced recombinant ly, synthetically, or by any 
means available, to those of skill in the art. It may be also 
cloned by reference to the techniques disclosed herein. 
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The invention includes a double stranded polynucleotide 
comprising a polynucleotide according to the invention and its 
complement. 

A third aspect of the invention relates to an {eg. expression) 
5 vector suitable for the replication and expression of a 
polynucleotide, in particular a DNA or RNA polynucleotide, 
according to the invention. The vectors may be, for example, 
plasmid, virus or phage vectors provided with an .origin of 
replication, optionally a promoter for the expression of the 

10 polynucleotide and optionally a regulator of the promoter. The 
vector may contain one or more selectable marker genes, for 
example an ampicillin resistance gene in the case of a bacterial 
plasmid or a neomycin resistance gene for a mammalian vector. 
The vector may be used in vitro, for example for the production 

15 of RNA or used to transfect or transform a host cell. The vector 
may also be adapted to be used in vivo, for example in a method 
of gene therapy. 

Vectors of the third aspect are preferably recombinant replicable 
vectors. The vector may thus be used to replicate the DNA. 
20 Preferably, the DNA in the vector is operably linked to a control 
sequence which is capable of providing for the expression of the 
coding sequence by a host cell. The term "operably linked" 
refers to a juxtaposition wherein the components described are 
in a relationship permitting them to function in their intended 
25 manner. A control sequence "operably linked" to a coding 
sequence is ligated in such a way that expression of the coding 
sequence is achieved under condition compatible with the control 
sequences. Such vectors may be transformed or transfected into 
a suitable host cell to provide for expression of a polypeptide 
of the invention. 

A fourth aspect of the invention thus relates to host cells 
transformed or transfected with the vectors of the third aspect. 
This may allow for the replication and expression of a 
polynucleotide according to the invention, such as the sequence 
of /figure 1, or the open reading frame thereof. The cells will 
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be chosen to be compatible with the vector and may for example 
be bacterial, yeast, insect or mammalian. 

A polynucleotide according to the invention may also be inserted 
into the vectors described above in an antisense orientation in 
order to provide for the production of antisense RNA. Antisense 
RNA or other antisense polynucleotides may also be produced by 
synthetic means. Such antisense polynucleotides may be used in. 
a method of controlling the levels of the M3/6 phosphatase 
protein in a cell and/or tissue. 

Thus, in a fifth aspect the invention provides a process for 
preparing a polypeptide according to the invention which 
comprises cultivating a host cell (eg, of the fourth aspect) 
transformed or transf ected with an (expression) vector of the 
third aspect under conditions providing for expression (by the 
vector) of a coding sequence encoding the polypeptide, and 
recovering the expressed polypeptide. 

The invention in a sixth aspect also provides (monoclonal or 
polyclonal) antibodies specific for a polypeptide of the 
invention. Antibodies of the invention include fragments thereof 
as well as mutants that retain the antibody's binding activity. 

The invention further provides a process for the production of 
monoclonal or polyclonal antibodies to a polypeptide of the 
invention. Monoclonal antibodies maybe prepared by conventional 
hybridoma technology using a polypeptide of the invention as an 
immunogen. Polyclonal antibodies may also be prepared by 
conventional means which comprise inoculating a host animal, for 
example a rat or a rabbit, with a polypeptide of the invention 
and recovering immune serum. 

In view of the presence of sequences in proteins of the present 
invention which are substantially homologous to sequences present 
in other proteins, particularly phosphatases, the antibodies will 
preferably be selective for the M3/6 protein and its mammalian 
homo'logues, i.e they will not recognize epitopes found on other 
phosphatases, particularly tyrosine /threonine phosphatases. 
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Fragments of monoclonal antibodies which retain antigen binding 
activity, such Fv, F(ab') and F{ab 2 )' fragments are included in 
this aspect of the invention. In addition, monoclonal antibodies 
according to the invention may be analyzed (eg. by DNA sequence 
5 analysis of the genes expressing such antibodies) and humanized 
antibody with complementarity determining regions of an antibody 
according to the invention may be made, for example in accordance 
with the methods disclosed in EP-A-0239400 (Winter) . 

The present invention further provides compositions comprising 
10 the. antibody or fragment thereof of the invention together with 
a carrier or diluent. Polypeptides, polynucleotides, vectors and 
hosts of the invention can be present in compositions together 
with a carrier or diluent. These compositions include 
pharmaceutical compositions where the carrier or diluent will be 
15 pharmaceutical^ acceptable. 

Pharmaceutical^ acceptable carriers or diluents include those 
used in formulations suitable for oral, rectal, nasal, topical 
(including buccal and sublingual) , vaginal or parenteral 
(including subcutaneous, intramuscular, intravenous, intradermal, 
intrathecal and epidural) administration. The formulations may 
conveniently be presented in unit dosage form and may be prepared 
by any of the methods well known in the art of pharmacy. Such 
methods include the step of bringing into association the active 
ingredient with the carrier which constitutes one or more 
accessory ingredients. In general the formulations are prepared 
by uniformly and intimately bringing into association the active 
ingredient with liquid carriers or finely divided solid carriers 
or both, and then, if necessary, shaping the product. 

For example, formulations suitable for parenteral administration 
include aqueous arid non-aqueous sterile injection solutions which 
may contain anti -oxidants, buffers, bacteriostatis and solutes 
which render the formulation isotonic with the blood of the 
intended recipient; and aqueous and non-aqueous sterile 
suspensions which may include suspending agents and thickening 
agents, and liposomes or other microparticulate systems which are 
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designed to target the polypeptide to blood components or one or 
more organs. 

Polynucleotides, vectors, host cells and polypeptides, according 
to the invention, and antibodies or fragments thereof and 
5 compositions comprising them may be used for the treatment, 
regulation or diagnosis of conditions, in a mammal including man. 
Such conditions include those associated with aberrant (eg due 
to a mutation in the gene sequence) expression of one or more of 
the M3/6 or Hb5 proteins or related family members. Treatment 

10 or regulation of conditions with the above-mentioned moieties, 
especially polypeptides, antibodies, fragments thereof and 
compositions etc. will usually involve administering to a 
recipient in need of such treatment an ef feet ive . amount of a 
polypeptide, antibody, fragment thereof or composition, as 

15 appropriate. 

The present invention further provides a method of performing an 
immunoassay for detecting the presence or absence of a 
polypeptide of the invention in a sample, the method comprising:, 
(a) providing an antibody according to the invention; 
20 (b) incubating the sample . with the antibody under 

conditions that allow for the formation of an 
antibody -antigen complex; and 
(c) detecting, if present, the antibody- antigen complex. 



Vectors carrying a polynucleotide according to the invention or 
25 a nucleic acid according to the invention may be used in a method 
of gene therapy. Methods of gene therapy include delivering to 
a cell in a patient in need of treatment an effective amount of 
a vector capable of expressing in the cell a polypeptide of the 
invention. 



30 Such vectors are preferably viral vectors. The viral vector may 
be any suitable vector available in the art for targeting 
particular cells. For example, Huber et al (Proc. Natl. Acac . 
Sci. USA (1991) 88, 8039) report the use of amphotrophic 
retroviruses for the transformation of hepatoma, breast, colon 



WO 97/06245 



PCT/GB96/01906 



-17- 

or skin cells. Culver et al (Science (1992) 25£; 1550-1552) also 
describe the use of retroviral vectors in virus -directed enzyme 
prodrug therapy, as do Ram et a I (Cancer Research (1993) £3; 83- 
88). Englehardt et al (Nature Genetics (1993) 4; 27-34 describe 
5 the use of adenovirus based vectors in the delivery of the cystic 
fibrosis transmembrane conductance product (CFTR) into cells. 

The invention also contemplates (diagnostic) assays. This might 
involve ' conducting an assay to find a dephosphorylation 
modulator, such as an inhibitor of dephosphorylation, or in other 
10 words an inhibitor of the polypeptides of the invention. It is 
thought that certain proteins (such as MAP kinases) are 
deactivated by dephosphorylation. Therefore, an inhibitor of 
dephosphorylation is likely to inhibit deactivation. 

Thus, one assay contemplated by the invention is to identify a 
15 modulator of the phosphatase polypeptides of the invention. The 
assay may comprise contacting a potential chemo therapeutic agent 
with a protein, such as an enzyme, that will usually be 
dephosphorylated by a phosphatase polypeptide of the invention, 
and observing the phosphorylation- state of the enzyme. The 
20 enzyme may be present in an extract from a cell which contains 
that enzyme. Enzymes contemplated include kinases, such as MAP 
kinases. 

The polynucleotides of the invention may thus find use as probes 
in diagnosis, in particular diagnosis or prognosis of tumours 
25 associated with deletions in the chromosome 11, particularly 
llpl5 and more especially in llpl5.5. Such tumours include brain 
or lung tumours* 

These probes may be used to detect polynucleotides of the 
invention, which detection may indicate that an individual has, 
30 or possesses predisposition to, a disease or disorder such as a 
neurodegenerative or proliferative disorder. Suitable probe 
detection and hybridisation techniques are well known in the art. 

The/M3/G and Hb5 genes of the invention may be responsible, if 
mutated, for various neurodegenerative or proliferative diseases. 
35 Several repeat regions have been identified in M3/6, namely 
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triplet repeats that encode glycine (one repeat of 17 amino 
acids) and serine (3 repeats of 4, 5 and 19 amino acids 
respectively) . The present invention thus relates to the 
diagnosis of susceptibility to disorders such as 
5 neurodegenerative or proliferative disorders by detecting the 
presence or absence of these repeat regions. By use of the 
unmutated gene or protein individuals may be treated that possess 
a neurodegenerative disease or disorder. 

The probes of the present invention are hybridisable to 
10 polynucleotides of the invention suitably under low stringency 
conditions. However, it is preferred that hybridisation take 
place under high stringency conditions. By low stringency 
conditions one envisages 3X SSC (0.5M sodium chloride, pH7.5) at 
room temperature. High stringency conditions that are envisaged 
15 are 0.1X SSC (0.1M sodium chloride, pH7.5) at 65 ©c. 

It will be apparent that probes contemplated may be capable of 
hybridising to the region of triplet repeats. In the M3/6 gene, 
this is encompassed by nucleotides 1756 to 1875 of the sequence 
shown in Figure 1. Such probes will be at least 15, preferably 
20 at least 20, for example 25, 30 or 40, nucleotides in length. 

The invention can thus provide a method of screening for 
susceptibility to a disease or disorder such as a 
neurodegenerative or proliferative disorder, which method 
comprises detecting, and possibly analysing, the triplet repeat 
25 region (present in polynucleotides of the invention) of an 
individual. 

The method may involve the polymerase chain reaction (PCR) . It 
is preferred that such methods will not require the use or 
radiolabelled nucleotides. Detection of a normal triplet repeat, 
30 such as is present in the M3/6 protein, may indicate than an 
individual's susceptibility to the neurodegerative disorder or 
disease is low. 

/ 

The method may also extend to diagnosing susceptibility to a 
disorder or disease such as a neurodegenerative or proliferative 
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disorder which method comprises detecting, if present, an 
amplification in a GGC, GGT, AGT or AGC repeat in a region of the 
human or animal genome that ' corresponds to the location of a 
polynucleotide of the present invention. 

5 An amplification in the polynucleotide repeat may be determined 
by removing a sample or genomic DNA from the patient, carrying 
out a PCR with primers upstream arid downstream of the repeat 
region, and determining the amount of nucleic acid produced. PCR 
generally does not occur to a substantial extend across genomic 
10 DNA comprising a repeat of 30 repeats or more. Substantial 
amounts of nucleic acid are only produced by PCR carried out on 
a DNA fragment in which there is little or no amplification of 
the nucleotide repeat, i.e. less than 30 repeats. 

In the accompanying drawings, which are provided to illustrate 
the present invention: 

■ Figure 1 gives a cDNA sequence of the M3/6 gene; 

Figure 2 gives another cDNA sequence of the M3/6 gene and 
the open reading frame thereof; 

Figure 3 shows portions of the cDNA sequence encoding the 
Hb5 human homologue; 

Figure 4 shows an alignment between the open reading frame 
of the murine (top sequence) M3/6 and human Hb5 genes. The 
latter is as disclosed in Martell et al, ibid; 
Figure 5 gives the N- terminal sequence of the M3/6 protein, 
and aligns it with the CL100 phosphatase, from which two 
proteins a consensus sequence is derived; and 
Figure S is a graphical representation of the hVH-5 gene 
structure. 

The following Examples describe the isolation and 
characterization of the novel protein and DNA of the invention 
from murine and human sources. However, other e.g. mammalian 
sources are within the scope of the present invention and other 
mammalian homologues of the protein may be isolated in an 
analogous manner. The Examples are presented here by way of 
illustration and are not to be construed as limiting on the 
invention. 
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EXAMPLE 1 - Sequence data 

A novel nucleic acid sequence (murine M3/6) is presented which, 
encodes a putative dual specificity threonine -tyrosine 
phosphatase which may be used in the characterisation of 
signalling mechanisms in brain and muscle. The presence of a 
complex trinucleotide repeat, located at the 3 'end of this 
sequence and which is translated, makes this phosphatase gene a 
candidate for a human disease caused by repeat expansion or 
mutation. Fragments of the human gene homologue (Hb5) are also 
presented. 

Isolation of M3/6. 

A human fragment from a yeast artificial chromosome (YAC) was 
isolated. Such YACs contain well over 50kb and to produce 
smaller, manageable sized segments for analysis were subcloned 
into cosmids of 45kb or less each. A series of cDNA clones were 
identified from these cosmids. 

M3/6 was isolated from a mouse brain cDNA library constructed 
from oligo dT and random primed cDNA (Blake, D. J. , Nawrotzki, R. 
and Davies, K. E. Isoform diversity of the murine 87K 
postsynaptic protein; submitted) , cloned into the EcoRI site of 
the vector pcDNAII. 

pcDNAII is a 2.9 kb plasmid vector from Invitrogen and contains 
the Ampicillin resistance gene. The M3/6 cDNA was isolated from 
the host cells XLl-Blue, by standard alkaline lysis method of 
preparing plasmid DNA. The 2.5 kb insert containing the entire 
M3/6 cDNA was released from the vector by digestion with the 
restriction enzyme EcoRI. The insert was separated from the 
vector by gel electrophoresis on 1% agarose and purified using 
spin columns. The purified insert was radiolabeled using 
Amersham megaprime labelling kit. M3/6/4e is a deletion 
derivative of M3/6 generated by the Erase- a-Base system. It 
encompasses nucleotides 1 to 1000 of M3/6 and can be released 
from the vector using the restriction enzymes EcoRI and Xbal. 



Sequencing of M3/6. 
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A nested set of deletion clones of the 2.5 kb cDNA was generated 
using the Erase-a-Base System commercially available from 
Promega . These clones were sequenced using double stranded 
sequencing protocol from USB- Sequencing reactions were resolved 
5 on a standard 6% acrylamide gel and visualised by autoradiography 
after overnight exposure at room temperature. Sequence analysis 
was done using the GCG Wisconsin package version 8. 

Sequence comparisons (see Figure 5) suggest that the M3/6 novel 
gene described is also a dual specificity phosphatase and will 
10 be able to dephosphorylate MAP kinase. In addition, portions of 
the murine M3/6 gene show considerable homology to the human Hb5 
gene homologue. . . 

The human Hb5 gene was isolated by screening a Clontech 
(commercially available) human foetal brain cDNA library with the 
15 M3/6 sequence. 

EXAMPLE 2 - Protein distribution in tissues 

RNA extraction and Northern blotting. 

RNA was extracted from mouse tissue following the method of 
Chomczynski, P. and Sacchi, N. (1987, Anal. Biochem. 162,156- 

20 159) . poly A* plus RNA was prepared from 100/xg of total RNA 
using the Dynabeads mRNA purification kit from Dynal. Northern 
blots were prepared according to Current protocols in Molecular 
Biology. The human fetal tissue Northern was obtained from 
Clontech. Hybridisation was carried out at 42°C and the blots 

25 were washed to a stringency of O.lxSSC, 0.1%SDS at 55°C. The 
blots were visualised by autoradiography after exposure for one- 
two days at -70°C. 

The results of the hybridisation of the M3/6 clone to Northern 
blot containing several mouse tissues were examined. A band at 
30 5kb in mouse eye and brain was seen, but no bands of significance 
were/ seen in spleen, skeletal muscle, small intestine or liver. 
The / filter was washed at 0.1 x SSC at 50°C. A 1.8kb band is seen 
in mouse lung with a faint band at 5kb. A similar band at 5kb 
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is seen when a subclone of M3/6, designated M3/6/4e (nucleotides 
1 to 1004) was used. In this case the 5kb band in lung is much 
stronger. 

Hybridisation of M3/6/4e to a Northern blot of human fetal 
5 tissues again showed the 5kb transcript predominantly in the 
brain and to a lesser extent the lung, but not the kidney or 
liver to any significant extent. This blot is evidence of the 
sequence conservation of this gene between mouse and man. 

EXAMPLE 3 - Assay 

10 In vitro transcription- translation assay. 

The pcDNAII vector utilises SP6 and T7 promoters which can be 
directly used for in vitro transcription- translation assays. One 
^g of RNase free circular plasmid DNA containing the insert was 
used for each reaction. The assay was performed according to 

15 the instructions provided with the Promega TNT Coupled 
Reticulocyte Lysate Systems. The synthesized proteins were 
analyzed by SDS gel electrophoresis on a 10% acrylamide gel and 
visualised by autoradiography after 1 hr exposure at room 
temperature. Lucif erase- encoding plasmids were used as controls 

20 for this assay. 

An analysis of the M3/6 clone in the transcription/translation 
coupled reticulocyte assay indicated that the protein product was 
80kD indicating that the translation of the mRNA must extend 
through and beyond the triplet repeats. The assay was carried 
25 out using a kit from Promega according to the manufacturer's 
instructions . 

EXAMPLE 4 - B8 Homology 
Hybridisation of oligonucleotide M3/6-C. 

M3 /£ -c is a 19-mer oligonucleotide the sequence of which is 
30 CTTGGTCATCGACAGCCGG and is from the cdc25 homology region of 
M3/6. The oligonucleotide was radiolabelled using Promega 
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polynucleotide kinase and >dATP according to the manufacturer's 
instructions. Hybridisation was effected at 42°C and washes 
were done in 3xSSC, 0.1%SDS at room temperature for the cosmids 
and at 55°C for the cosmid subclones. 

5 The subclone filters were washed at 3xSSC at 55°C. A strong 
signal was obtained. This suggests that a human sequence with 
high homology to this motif is present in B8. B8 contains 
markers (e.g. Gl and CMS1) which are in linkage disequilibrium 
with autosomal spinal muscular atrophy. Thus this PTP is a 
10 candidate for this motor neuron disorder and parts of it may be 
useful diagnostically . 

EXAMPLE 5 - M3/6 Expression 

Proof that M3/6 encodes a cytoplasmic protein, which was 
expressed in PC12 cells, was derived from the following 

15 experiment. A plasmid capable of directing expression of the 
M3/6 gene in a mammalian cell line was constructed by cloning the 
M3/6 cDNA into the polylinker of the vector pEFmycpLINK 29 . This 
results in the expression of a fusion protein between the myc 
epitope (MEQKLISEEDL) , recognised by the monoclonal antibody 

20 9E10, and the protein of the invention under the control of the 
Elongation Factor gene promoter. This DNA construct was 
microinjected into PC12 cells. These cells are able to undergo 
neurite outgrowth typical of neuronal cells when stimulated with 
Nerve Growth Factor (NGF) . Expression of M3/6 was monitored by 

25 staining with the a-myc epitope antibody 9E10. This revealed the 
surprising and novel finding that M3/6 encodes a cytoplasmic 
protein. Where the localization of other potential MAP kinase 
phosphatases has been determined, this has been exclusively 
nuclear. Furthermore expression of M3/6 failed to block NGF- 

30 stimulated neurite outgrowth which is surprising as expression 
of a MAP kinase phosphatase might be expected to block this MAP 
kinase dependent process. 

Given the unusual cytoplasmic location of M3/6 it is possible 
that mutation or frameshift at the triplet repeat could lead to 
35 a change in the subcellular location of the protein. This might 
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lead to its relocation to the nucleus which may be a 'default' 
location for proteins of this family. Potentially this could 
lead to a loss or gain of function. 

EXAMPLE 6 - Tests for MAP kinase phosphatase activity 

The putative phosphatase was expressed and purified from 
bacterial or insect or mammalian cells followed by incubation 
with in vitro 32 P phosphorylated MAP kinase. Dephosphorylation 
of the MAP kinase can be assayed by gel electrophoresis followed 
by autoradiography. 

An alternative assay involves co- expression of the putative 
phosphatase and a myc-epitope tagged version of MAP kinase in COS 
cells. Stimulation of these transfected cells with e.g. serum 
or EGF leads to a mobility shift in the MAP kinase which is 
revealed by gel electrophoresis, western blotting and probing 
with a myc epitope recognising antibody. Co-expression of a MAP 
kinase phosphatase should lead to the abolition of this mobility 
shift. 

This specification describes the identification of a novel gene 
encoding a novel protein that is. highly likely to be a 
phosphatase which is a member of a sub-family of dual specificity 
threonine -tyrosine phoshatases expressed in neuronal tissue. It 
has a motif which shows very high homology to the yeast cdc25 
yeast tyrosine phophatases and possesses the characteristic 
conserved catalytic domain of all phosphatases. A 
transcription/translation coupling experiment {Example 3) has 
confirmed the presence of an expressible open reading frame and 
strongly suggests that the complex repeat is expressed as part 
of the 3' domain of the molecule. Since this may expand by 
replication slippage or other mechanisms as in other neurological 
disorders, any change in the size of this triplet repeat may give 
rise to molecular pathology. 

The^presence of the crosshybridisation of this sequence to human 
sequences derived from the candidate gene region for SMA makes 
this a candidate gene for the disorder. Since the gene is 
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expressed in brain and lung, but predominantly in brain, any 
change in the function of the gene might give rise to a 
neurological disorder. This can be tested by using the sequence 
or part of the sequence as a probe to hybridise to DNA from 
5 patients with such diseases. Alterations in DNA from patients 
might also be seen using PCR primers derived from the 
corresponding human sequence. A change in the protein might also 
be detected using antibodies raised from peptides or expressed 
portions of this sequence and the investigation of muscle 
10 biopsies. 

EXAMPLE 7 

Chromosomal Lo calization and Genomic Organization of Hb5 . 

HB5 cDNA was used previously for fluorescence in situ 
hybridisation (FISH) analysis on human metaphase chromosomes 
15 (Theodosiou et al., 1996) . It mapped to three locations with the 
principal peaks being on 10qll.2 and distal llp!5, with a further 
peak on 10q22. To further refine the chromosomal localisation 
of HB5 both a human chromosome 11 cosmid library (Smith et al., 
1993) and a total human genomic PAC library (Ioannou et al., 
20 1994) were screened with M3/6-4e and HB5 respectively. Four 
cosmids and nine PACs were isolated. All nine PACs gave an 
identical PstI restriction pattern when probed with HB5 that was 
entirely different from that of the cosmids, whose pattern was 
similarly identical. However both PACs and cosmids showed 
25 cognate bands with PstI -digested total human genomic DNA. Thus 
cDNA-positive cosmid bands of approximately 2.3kb (doublet), 
1.6kb (dimorphic with a 2.0kb band) and l.OSkb and PAC bands of 
lkb and 3kb are seen in digests of total human DNA. The 
possibility of other copies of this gene or related genes is 
30 suggested by other bands seen in the genomic DNA digests at 
approximately lkb and 4kb. To assign a chromosomal localisation 
to these two separate genomic clone contigs one cosmid (cSRL 
15a6) and two of the PACs (86N13 and 234B10)were used in FISH 
experiments. The cosmid maps uniquely to llpl5.5, whilst the 
35 PACs both map to I0qll.2. No signal from either was seen 
corresponding to the minor peak identified by FISH with HB5 cDNA 
on 10q22 (Theodosiou et al., 1996). 



WO 97/06245 



PCT/GB96/01906 



-26- 

PCR analysis of the PAC clones using oligonucleotide primers 
based on the cDNA sequence showed that in all cases the size of 
the PCR amplification product was exactly the same size in both 
PACs and cDNA. This suggests that the PAC r and hence the 
5 chromosome 10qll.2 copy of the gene is intronless and presumably 
a pseudogene. In contrast, the PCR products from the cosmid 
were, for the most part larger than the cDNA, suggesting the 
presence of introns . Subclones of the cosmid that were positive 
by hybridisation to the cDNA were sequenced to determine the 

10 intron/exon boundaries and the flanking intronic sequences by 
comparison with the cDNA sequence. A graphical representation 
of the hVH-5 gene is shown in Fig. 6. The 1875bp open reading 
frame coding for hVH-5 is distributed over 6 exons # the smallest 
of which is 124bp (Table 1) . The sizes of exon 1 and exon 5 have 

15 not been determined since neither the transcription start site 
nor the polyadenylation signal for this gene have been found but, 
given a mRNA size of 5kb (Martell et al., 1995; Theodosiou et 
al. f 1996) and assuming no 5' or 3' untranslated exons (as in 
CL100(Kwak et al., 1994) the gene is spread over not more than 

20 13kb. 

The introns range in size from 193bp to approximately 4.75kb, . 
with the second intron being by far the largest. The first exon 
contains the initiating methionine and the first CH2 (cdc25 
homology 2) domain. The second CH2 domain is split between exons 
25 2 and 3 . Exon 5 contains the entire conserved catalytic PTPase 
domain, whilst the entire PEST (proline, glutamic acid, serine 
and threonine -rich) domain is contained within exon 6, which also 
contains the translation termination codon and all the 3' UTR so 
far described. 

30 Using conserved primers flanking the trinucleotide repeat found 
in the mouse cDNA, M3/6 (Theodosiou et al., 1996) PCR analysis 
using the cosmids and PACs as templates showed that both the 
chromosome 10 and 11 copies of the human gene gave the size of 
product predicted from the human hVH-5 cDNA sequence. This was 

35 confirmed in the chromosome 11 cosmid by sequencing using these 
same primers. In addition, no polymorphism for this repeat was 
noted among a small number of human individuals. Thus no 
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evidence was found for a copy of this gene containing the complex 
trinucleotide repeat found in mouse. 

EXAMPLE 8 - Loss of Heterozygosity of hVH-5 in lung tumours 

The region of chromosome llq to which hVH-5 maps has been 
implicated in the development of non- small cell lung cancer 
(NSCLC) breast cancer, rhabdomyosarcoma, Wilm's tumour , bladder 
cancer and testicular cancer (see Bepler and Garcia, 1994). 
Given the previously suggested potential tumour- supressor 
activity of hVH-5 (Martell et al., 1995; Theodosiou et al . , 1996) 
we investigated loss of heterozygosity at this locus in 15 lung 
tumour samples. Eight of these were heterozygous for a PstI 
polymorphism in DNA from normal blood, one of which showed loss 
of heterozygosity in DNA from the corresponding tumour. 

EXAMPLE 9 - Analysis of Methvlation at the hVH-5 locus 

A number of genes which map to human chromosome llpl5.5 are 
imprinted, that is only one of the parental alleles is expressed 
in somatic cells (Barlow, 1995) These include IGF2 and H19 
(Rainier et al., 1993). One phenomenon associated with 
imprinting is the differential methylation of the parental 
alleles (Barlow, 1995) . It has recently been suggested that 
imprinted genes have few and small introns (Hurst et al., 1996) 
Since hVH-5 both maps to chromosome llplS.5 and has few, small 
introns, there is the possibility that this gene might also be 
imprinted. Imprinting of the gene from normal adult brain and 
lung is studied as is imprinting in fetal and tumour cells. 
Comparison of the patterns of imprinting may be used to provide 
diagnostic and/or prognostic assays of disease status. Diseases 
include proliferative diseases of lung and/or brain tissue. 
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CLAIMS 

1. A polypeptide comprising a murine phosphatase designated 
M3/6 or a human or other mammalian homologue thereof which 
phosphatase is characterized by the following features: 

(a) it is encoded by a cDNA sequence obtainable from a 
mammalian brain cDNA library, said DNA sequence being 
selectively detectable with a murine DNA sequence as 
shown in Figure 1 or one or more of the human DNA 
sequences shown in Figure 3; and 

(b) it comprises a phosphatase catalytic domain of the 
sequence VHCXAGXXRSX, where X is any amino acid. 

2. A polypeptide comprising: 

(a) an allelic variant of a protein as defined in claim l; 

(b) a protein at least 80% homologous to a protein as 
defined in claim 1; . 

(c) a fragment of a protein as defined in claim 1 or (a) 
or (b) above having phosphatase activity and being of 
at least 15 amino acids long; or 

(d) a fusion protein comprising a protein as defined in 
claim 1 or any one of (a) to (c) above. 

3 . A polypeptide according to claim 1 or 2 carrying a revealing 
or detectable label. 

4. A polypeptide according to claim 1, 2 or 3 fixed to a solid 
phase . 

5. A polynucleotide which comprises: 

(a) a sequence encoding a protein or polypeptide as 
defined in claim 1 or 2 or the complement of said 
sequence; 

(b) a sequence of nucleotides shown in Figure 1 or Figure 

2; 

/ (c) a sequence capable of selectively hybridising to a 
sequence in either (a) or (b) ; or 
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(d) a fragment of any of the sequences in {a) to (c) . 



6. A polynucleotide according to claim 5 which is a DNA 
polynucleotide . 

7. A polynucleotide according to claim 5 or 6 which comprises 
at least 20 nucleotides. 

8. A polynucleotide according to any of claims 5 to 7 which 
comprises the cDNA sequence shown in Figure 1. 

9 . A polynucleotide according to any of claims 5 to 8 carrying 
a revealing or detectable label. 

10. A vector comprising a polynucleotide according to any of 
claims 5 to 9 . 

11. A vector according to claim 10 which is a recombinant 
replicable vector comprising a coding sequence which encodes a 
polypeptide as defined in claim 1 or 2. 

12. A host cell comprising a vector according to claim 11. 

13. A host cell according to claim 12 transformed by, or 
transfected with, a recombinant vector according to claim 11. 

14. A host cell transformed by a recombinant vector according 
to claim 11 wherein the coding sequence is operably linked to a 
control sequence capable of providing for the expression of the 
coding sequence by the host cell. 

15. A process for preparing a polypeptide as defined in claim 
1 or 2, the process comprising cultivating a host cell according 
to any of claims 12 to 14 under conditions providing for 
expression of the recombinant vector of the coding sequence, and 
recovering the expressed polypeptide. 

16^ An antibody or a fragment thereof capable of binding to a 
polypeptide as defined in claim 1 or 2. 
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17. A screening assay for identifying a putative 
chemotherapeutic agent for the treatment of disease, the assay 
comprising: 

(A) bringing into contact: 

(i) a phosphorylated polypeptide which can be dephosphorylated 
by M3/6; 

(ii) a polypeptide as defined in claim 1 or 2; and 

(iii) a putative chemotherapeutic agent; 

under conditions in which component (ii) would dephosphorylate 
component (i) in the absence of (iii) ; and 

(B) measuring the extent to which component (iii) is able to 
disrupt, interfere with or inhibit dephosphorylation. 

18. An assay according to claim 17 wherein the putative 
chemotherapeutic agent is a fragment of 10 or more amino acids 
of a polypeptide as defined in claim 2. 

19. A method of diagnosing susceptibility to a disease or 
disorder, the method comprising detecting an amplification or 
mutation in a (GGC) n , (GGT) n , (AGT) w "or (AGO m repeat where n and 
■n independently represent an integer from 2 to 20 in a region of 
the human genome corresponding to the location of a 
polynucleotide according to claim 5 . 

20 . A method according to claim 19 wherein n is an integer from 
15 to 20 and m is an integer from 4 to 20. 

21. An isolated polypeptide which comprises the M3/6 sequence 
shown in Figure 2 . 

22. An isolated polynucleotide encoding the polypeptide of claim 
21. 

23 . An isolated polynucleotide according to claim 22 which has 
the sequence depicted in Figure 2. 
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FIGURE 1 

E£CAGGTCTGGCACCATGC^ 

HCAGGGGTCACTCTCCC^^ 

GAAGGTGATGGACGCAAAGAAACTGGCMCCTGCT^ 

G6TCATCGACAGCCGGTCCTTCGTGGAGTATAACAGCTGCCAC6TGCT6AGCTCTGTGAA 

TATCTGCTGTTCAAAGCTGGTGAAGCGGCGCCTC 

GCTTATC(7£CCTGCTACACGGAGC^^ 

GTATC^CAGAGCACACGAGATGCCAGCCTGCTG 

GCTCAGCAAGCTGGACGGCTGCnCGACAGTGTGGCCATCCTCACAGGACGTTCGCCACC 
nCTCCTCCTGCTTCCCTt^CTCTGTGAGGCMGCCTGCCACTCTACCGTCCATGAGCC 

TCTCTCAGCCCTGCCTGCCTGTGCCCAGT^ 

ACCTGGGCTCTCAGAAAGATGTCTTGAACAAGGATCTGATGACCCAAAACGGAATAAGCT 
ATGTCCTWTGCMCMCTCCTGCCCTAMCC^GACTTCATCTGTGAGAGCCGTTTCA 
TGCGTATCCCCATCMTGAWCTACTGTGAAAAGCTGCTGCCCTGGCTGGACAAGTCCA 
TCGAGTnAnGATAAAGCCMGCTGTCOTCTGCCAAGTCATTGTTCACTGTCTGGCTG 

GCATCTCTCtTTCTGCCACCA^^ 
CTGACGACGCATA&^TTTGTGAAGGATCGG^ 

TCCTGGGCt^TTGCTGGAGTATGAGAGGAGTCTGAAGCTGtrTGGCTGCCCTGCAGACTG 
ATGGACCTCACTTGGGGACCCCTGAGCCCCTCATGGGCXCGGWGCAGGCATCCCACTGC 
CCCGGCTGCCACCATCTACCTCAGAGAGCGCTGCCACTGGGAGCGAGGCAGCCACCGCAG 
CCAGGGAGGGCAGCCCAAGTGCTGGAGGG . ATGC . . TCCGATCCCCAGCACAGCTCCAGC 
CACCAGCG. . CGGCTGCAG . CAGG , CCTQC , GTGGCCTGCACCTCTCCTCTGACCGCCTC 
CAGGACACCAACCGCCTCAAG . CGTTCCTT7TCCCTGGACATCAAGTCGGCCTATGCACC 
CAGCAGGAGGC(XGACTTTCCCGGCCCNACCNGACCCCCGGTGAAGCCCCM^TCTN 

GCTGACABCCNGjCtQQGGGNACACTOGGOT 

TCCGntCAGAGJGO^^ 
CCCGCTCCCCC(XGCAT(KTCTGGGCCTGAACm(^ 

GGCASSCTCTCGGCCCTGTCGGCGCCCGGGCTGCCTGGCCCTGCCAGCCGGCTGGNCCCG 
GGGGCTG^GCOSCCACTGGACTCCCCAGGCACACCGTCGCCN 
CAGGCGCTGTGnCTCCCCTTTGGCCGGGTAAGTGCAGGCGNANCTGGACCCGGTAACAG 
CAGCAGCAGYGGTGGTGGTGGTGGTGGTGGYGGCGGCGGCGGCGGCGGCGGCGGCGG^ 

<y«CAGCAGCAH»G^^ 

TAGTAGTAGTAGTAGTGACCTGCGGAGGCGGGATGTG(£GACCGGCTGGCCC^AGGAGCC 
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TGCTGCAGATGCACAGflTTCAAGAGGCGCAGCT^ 
GGAGGGGCGGCACGTGCAGAGCTCCTGGCAGNC^ 
GCCTGf^CATCGAAGTATCGTGCCTTCAGAACTCaTGTGCOT 
CCAGCTATAAATATATAnATATATAAAACACACAGAAAAGGTA^ .TBC 
MTTTTTATCAAGAAGTAAATATT . CGATTTTT . ATTTATTTAAGCTAGTGATCTGGCAA 
CTGTGCGGGGCGGC CCTAAAGCTCTGTTTTT ACTGTCTGGTATTTAAACTGAAACAGGTT 
TCTMGCMTATGAGGCCACCnCMTCCCAMCTGGGTTGACAGGCCTGGGCCCCTGCT 
T GCCCCTCCCCTCTGGAAACATTACTGACC1TTCAAAGAGCTGCC WGCTTTCCT 
TTTTTACATAAGAAAAAAGGGGGGGGGGRAA 
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FIGURE 2 
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FIGURE 3 

Smal subclone of HB5a/3 #2 sequenced with T3 17/7/95 

SCORES Initl: 463 Inltn: 527 Opt: 557 

85. 2% identity in 182 bp overlap 

1049 1039 1029 1019 1009 999 990 
M3-6.S AAGTGAOTCCATCAGTCTGCAGGGCAGCCAGCAGOTCAGACT 

10 20 30 

989 979 969 959 949 939 930 

M3-6.S MCTGGCCCAGGAAGTTGAAGnGGGCGAG^ 

40 50 60 70 80 90 

929 919 909 899 889 879 870 

M3-6.S TATG(£TCGTCAGAAGACATGCCCATGGT^ 

H b552 t WM^^ 

100 110 120 130 140 

869 859 849 839 829 819 810 

M3-6.S GAGC6AGAGATGCCAGCCA6ACAGTGAACAATGACTT GGCAGCTGGACAGCTTGGCTTTA 

Hb5s2t 

150 160 170 180 

809 799 789 779 769 759 750 

M3-6.S TCAATAA/OCXyVTGGACTTGTO 

SCORES Initl: 343 Initn: 343 Opt: 387 

78.81 identity in 156 bp overlap 

1880 1890 1900 1910 1920 1930 
M3-6.S CGGAGGGSGGATGTGCGGACCGGCTGGfXCGAGGAGCCTGCT 

ilUIJi 

10 20 30 

1940 1950 1960 1970 1980 1990 
M3-6.S AGGCGWGCJGCD^TGGAGnCGAAGAGGt^TGGTGGAGGGGCGGCACGTGWGAK 

40 50 60 70 80 

2000 2010 2020 2030 2040 2050 
M3-6.S TCCTGGCAGNCCT - GGCAAG WAACCAGCTTCTCTGGCAGCGT GGAi^TCATCGAAGTAT 

90 100 110 120 130 140 

2060 2070 2080 2090 2100 2110 

M3-6.S CGTKCnWGAAGTCCCTGTtrc^ 
tiii 

HbSslt ictGACCCC 
150 
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SCORES Initl: 241 Inltn: 241 Opt: 323 

82. 4X identity in 119 bp overlap 

1359 1349 1339 1329 1319 1309 
M3-6.S CNGGTKGGGCCGGGAAAGTCGGGCCTCCTGCT^GGCTGCATAGGCCGACTTGATGTCCAGG 

Hb5s3t 

10 20 30 

1299 1289 1279 1269 1259 1249 
M3-6.S 6AAAAG6AACGCTT6AGGCGGTTGGTGTCCTGGAGGCGGTCA6AG6AGAGGT6CAGGCCA 

yHIW 1U1H1M 

40 50 60 70 80 90 

1239 1229 1219 1209 1199 1189 
M3-6.S (XCAGGCCTGCTGCAGCCGCGCTGGTGGCTGGAGCTGTGCTGGGGATCGGAG(^TCCCT^ 

Hb5s3t 

100 110 



PstI subclone of HB5a/3 #2 sequenced with T7 primer 17/7/95 

SCORES Initl: 213 Inltn: 213 Opt: 227 

79. 5X identity in 78 bp overlap 

1029 1019 1009 999 989 979 

H3-6.S CTGWGGGWGCCAGCAGCnCAGACTCCTCTWTACTCCAGCAACTGGCCCAGGAAGn 

WiiHli 

10 20 30 

969 959 949 939 929 919 

M3-6.S GMGTTGGGC6AGATGGAGGGGCGCCGATCCTTCACAAACCT6TATGCGTC6TCA6AAGA 

Hb5 P 2t 

40 50 60 70 

909 899 889 879 869 859 

M3-6.S CATGCCCATGGTTTTOTGAT^ 



SCORES Initl: 169 Initn: 169 Opt: 202 

81. 3X Identity 1n 75 bp overlap 

1330 1340 1350 1360 1370 1380 

M3-6.S mCCCGKCCNACQ«aCCra^ 

10 20 30 
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1390 1400 1410 1420 1430 1440 
H3-6.S TGGGGGNACACTGGGCCTGCCCTCGCCWGCCW6ACAGCCCGGACTCCGTTCCAGAGT6 

40 50 60 70 

1450 1460 1470 1480 1490 1500 
M3 - 6 . S CCGCCCAt^CCCCGCCGCGACGCCCCCCreGCTAGnCGCCTGCCCGCTCCCCCGCGCA 



PstI subclone of HB5a/3 #2 sequenced with T3 primer 17/7/95 

SCORES Initl: 151 Initn: 151 Opt: 241 

76. OX identity in 104 bp overlap 

260 270 280 290 300 310 

M3-6.S CTGGTGAAGCGGCGCCnCAGCAGGGAAAAGTGACMnGCfGAGCnATCCAGCCTGCT 

Hb5p2t ncCGGG&t(^wti(^ 

10 20 30 

320 330 340 350 360 370 

M3-6.S ACACGGAGCCAGGTGGATGCCACAGAACCACAGGATGT AGTGGTGTATGACCAGAGCACA 

40 50 60 70 80 90 

380 390 400 410 420 430 

M3-6.S CGAGATGCCAGCGTGCTGGCAGCAGACAGCHCCTGTCCATCCTGCTCAGCAAGCrGGAC 

Hb5p2t i<^G<Ul-CNiAii^rf 

100 



Smal subclone of HB5a/3 #2 sequenced with 17 17/7/95 

SCORES Initl: 151 Initn: 151 Opt: 236 

77.9% identity in 95 bp overlap 

260 270 280 290 300 310 

M3-6.S CTGGTGMGCGGCGCCTTCAG(^GGGAAAAGT^^ 

Hb5s2t 

10 20 30 

320 330 340 350 360 370 

M3-6.S ACACG6AGCCAGGTGGATGCCACAGAACCACAGGATGTAGTG6TGTATGACCAGAGCACA 

40 50 60 70 80 90 

380 390 400 410 420 430 

M3-6.S CGAGATGCCAGC^TGCTGGWGCAGAWGfTTCCTGTCCATCCreCTWG 

Hb5s2t cir^A 
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SCORES Initl: 133 Initn: 133 Opt: 224 

69.81 identity in 116 bp overlap 

1240 1250 1260 1270 1280 1290 

K3-6.S CCTCTGACCGCCTmGGACAC^ 

„ b5plt miHii 

10 20 

1300 1310 1320 1330 1340 1350 
M3-6.S CG(XCTATGCACraGCA(£A^ 

Hbspit aiil\kddM 

30 40 50 60 70 80 

1360 1370 1380 1390 1400 1410 

M3-6.S CCCAAGCTCT- -^IAAGCT-GA(>GCC^GTCTGGGGGNACACTGGGCCT6CCCTCGCCCAG 

Hb5plt iiGAAiil^CGNJUL^ilG^^JUrN^TCG 
90 100 110 



SCORES Initl: 129 Initn: 129 Opt: 213 

72. OX identity in 100 bp overlap 

1000 1010 1020 1030 1040 1050 
H3-6.S CTGMGCTGCTGGCTGCCCTGWGACTGATGGACCTCACTTGGGGACCCCTGAGCCCCTC 



Hb5p3t MCCGii^ 

P 10 20 30 

1060 1070 1080 1090 1100 1110 
M3-6.S ATGt^CC(^G(^(XATCCtt^^ 

Hb5p3t CCCAiTdiTM«mMMM«il 
40 50 60 70 80 90 

1120 1130 1140 1150 1160 1170 
M3-6.S GCCACTGGGAGC(^GGWGCCACCGCAGCW^ 

Hb5p3t (^(IJI^MJUtNCNNCTNCAGGGAGG 
100 110 120 
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SCORES Initl: 129 Initn: 129 Opt: 196 

66.91 Identity in 127 bp overlap 

1239 1229 1219 1209 1199 1189 

M3-6.S CGCAGGCCTGCTGCAGCCGCGCTGGTGGCTGGAGCT6TGCTGGG6ATCGGAGCATCCCT C 

Hb5p3t G^I^-i^NCGAyGcliiicG 

10 20 

1179 1169 1159 1149 1139 1129 
M3-6.S (^GCACTTGGGCTGCCCTCCCTGGCTGCOT 

»** y^l«UM IMkUKMI 

30 40 50 60 70 

1119 1109 1099 1089 1079 1069 
M3-6.S CTGAGGTAGATGGTGGCAGCCGGGGCAGTGGGATGCCTGCTGCCGGGCCCATGAGGGGCT 

80 90 100 110 

1059 1049 1039 1029 1019 1009 
M3-6.S CAGGGGTCCCCAAGTGAGGTCCATCAGTCTGCAGGGCAGCCAGCAGCTTCAGACTCCTCT 



SCORES Initl: il2 Initn: 1-12 Opt: 184 

83. IX identity in 65 bp overlap 

1609 1599 1589 1579 1569 1559 

M3-6.S QCCGGCTGGCAGGGCCAGGCAGCCCGGGCGCCGACAGGGCCGAGA-GSSTGCCGTGGAGT 

Hb5s4t 

10 20 30 

1549 1539 1529 1519 1509 1499 
M3-6.S CTGCCGGGCCGTGTCTCCAAAGnCAGGCCCAGACWTGCGCGGGGG^CGGGCAGGCGA 

Hb5s4t 

40 50 60 

1489 1479 1469 1459 1449 1439 
M3-6.S ACTAGCCGGGGGGGCGTCGCGGCGGGGTCGTGGGCGGCACTCTGGAACGGAGTCCGGGCT 
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SCORES Initl: 67 Initn: 67 Opt: 91 

58.61 identity in 87 bp overlap 

1310 1320 1330 1340 1350 
M3-6.S GCCTATGCACCWGCAGGAGGCCt^CTTTCCQaGCCC^CNGACCCCCGG- - " 

Hb5s5t COWOGWCCX^ 

10 20 30 40 

1360 1370 1380 1390 1400 1410 

M3-6.S CCCCAAGCTCTNAAGCTGACAGCCKGTCTGGGG-GNACACTG^CTKCCTCG^ 

Hb5s5t y<y^^ 

50 60 70 80 90 100 

1420 1430 1440 1450 1460 1470 
M3-6.S CCAGA(^CCG(^aCCGnCD^ 

Hb5s5t C 



Sraal subclone of HB5a/3 #2 sequenced with T3 17/7/95 

SCORES Initl: 44 Initn: 44 Opt: 62 

■ 53. 5X identity in 114 bp overlap 

250 260 270 280 290 300 

M3-6.S nrAAAGCTGGTGAAGCGGCGCCTTCAGCAGGGAAMGTGACMT - TGCTGAGCTTATCC 

Hb5s2t TCCAGO^(£CCW(^^ 

30 40 50 60 70 80 

310 320 330 340 350 360 

M3-6.S AGCCTGCTACACGGAGCCAGGTGG - - ATGCWWGAACW WGGATGTAGTGGTGTATGA 

Hb5s2t AA^lUiGt^ 

90 100 110 120 130 140 

370 380 390 400 410 420 

M3-6.S --CWGAGCAWCt^ra^ 

Hb5s2t TGGTG(k:(UG^G(y^t^^^CAGACAGTGGACNA^ 
150 160 170 180 
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